48 research outputs found
Training Neural Networks with Stochastic Hessian-Free Optimization
Hessian-free (HF) optimization has been successfully used for training deep
autoencoders and recurrent networks. HF uses the conjugate gradient algorithm
to construct update directions through curvature-vector products that can be
computed on the same order of time as gradients. In this paper we exploit this
property and study stochastic HF with gradient and curvature mini-batches
independent of the dataset size. We modify Martens' HF for these settings and
integrate dropout, a method for preventing co-adaptation of feature detectors,
to guard against overfitting. Stochastic Hessian-free optimization gives an
intermediary between SGD and HF that achieves competitive performance on both
classification and deep autoencoder experiments.Comment: 11 pages, ICLR 201
A Multiplicative Model for Learning Distributed Text-Based Attribute Representations
In this paper we propose a general framework for learning distributed
representations of attributes: characteristics of text whose representations
can be jointly learned with word embeddings. Attributes can correspond to
document indicators (to learn sentence vectors), language indicators (to learn
distributed language representations), meta-data and side information (such as
the age, gender and industry of a blogger) or representations of authors. We
describe a third-order model where word context and attribute vectors interact
multiplicatively to predict the next word in a sequence. This leads to the
notion of conditional word similarity: how meanings of words change when
conditioned on different attributes. We perform several experimental tasks
including sentiment classification, cross-lingual document classification, and
blog authorship attribution. We also qualitatively evaluate conditional word
neighbours and attribute-conditioned text generation.Comment: 11 pages. An earlier version was accepted to the ICML-2014 Workshop
on Knowledge-Powered Deep Learning for Text Minin
sk_p: a neural program corrector for MOOCs
We present a novel technique for automatic program correction in MOOCs,
capable of fixing both syntactic and semantic errors without manual, problem
specific correction strategies. Given an incorrect student program, it
generates candidate programs from a distribution of likely corrections, and
checks each candidate for correctness against a test suite.
The key observation is that in MOOCs many programs share similar code
fragments, and the seq2seq neural network model, used in the natural-language
processing task of machine translation, can be modified and trained to recover
these fragments.
Experiment shows our scheme can correct 29% of all incorrect submissions and
out-performs state of the art approach which requires manual, problem specific
correction strategies
Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks
Recommendations can greatly benefit from good representations of the user
state at recommendation time. Recent approaches that leverage Recurrent Neural
Networks (RNNs) for session-based recommendations have shown that Deep Learning
models can provide useful user representations for recommendation. However,
current RNN modeling approaches summarize the user state by only taking into
account the sequence of items that the user has interacted with in the past,
without taking into account other essential types of context information such
as the associated types of user-item interactions, the time gaps between events
and the time of day for each interaction. To address this, we propose a new
class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that
can take into account the contextual information both in the input and output
layers and modifying the behavior of the RNN by combining the context embedding
with the item embedding and more explicitly, in the model dynamics, by
parametrizing the hidden unit transitions as a function of context information.
We compare our CRNNs approach with RNNs and non-sequential baselines and show
good improvements on the next event prediction task